Clusterability: A Theoretical Study

نویسندگان

  • Margareta Ackerman
  • Shai Ben-David
چکیده

We investigate measures of the clusterability of data sets. Namely, ways to define how ‘strong’ or ‘conclusive’ is the clustering structure of a given data set. We address this issue with generality, aiming for conclusions that apply regardless of any particular clustering algorithm or any specific data generation model. We survey several notions of clusterability that have been discussed in the literature, as well as propose a new notion of data clusterability. Our comparison of these notions reveals that, although they all attempt to evaluate the same intuitive property, they are pairwise inconsistent. Our analysis discovers an interesting phenomenon; Although most of the common clustering tasks are NP-hard, finding a closeto-optimal clustering for well clusterable data sets is easy (computationally). We prove instances of this general claim with respect to the various clusterability notions that we discuss. Finally, we investigate how hard it is to determine the clusterability value of a given data set. In most cases, it turns out that this is an NP-hard problem.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Theoretical Study of Clusterability and Clustering Quality

Clustering is a widely used technique, with applications ranging from data mining, bioinformatics and image analysis to marketing, psychology, and city planning. Despite the practical importance of clustering, there is very limited theoretical analysis of the topic. We make a step towards building theoretical foundations for clustering by carrying out an abstract analysis of two central concept...

متن کامل

Which Data Sets are ‘Clusterable’? – A Theoretical Study of Clusterability

We investigate measures of the clusterability of data sets. Namely, ways to define how ‘strong’ or ‘conclusive’ is the clustering structure of a given data set. We address this issue with generality, aiming for conclusions that apply regardless of any particular clustering algorithm or any specific data generation model. We survey several notions of clusterability that have been discussed in th...

متن کامل

ar X iv : 1 50 1 . 00 43 7 v 1 [ cs . C C ] 2 J an 2 01 5 Computational Feasibility of Clustering under Clusterability Assumptions

It is well known that most of the common clustering objectives are NPhard to optimize. In practice, however, clustering is being routinely carried out. One approach for providing theoretical understanding of this seeming discrepancy is to come up with notions of clusterability that distinguish realistically interesting input data from worst-case data sets. The hope is that there will be cluster...

متن کامل

An Effective and Efficient Approach for Clusterability Evaluation

Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. As such, the study of clusterability, which evaluates whether data possesses such structure, is an integral part of cluster analysis. Yet, despite their central role in the theory and application of clustering, current notions of clusterability fall short in two crucial aspects that render them...

متن کامل

Computational Feasibility of Clustering under Clusterability Assumptions

It is well known that most of the common clustering objectives are NPhard to optimize. In practice, however, clustering is being routinely carried out. One approach for providing theoretical understanding of this seeming discrepancy is to come up with notions of clusterability that distinguish realistically interesting input data from worst-case data sets. The hope is that there will be cluster...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009